LSTM-Based Machine Translation for Madurese-Indonesian
نویسندگان
چکیده
Madurese is one of the regional languages in Indonesia, which dominates East Java and Madura Island particular. The use as a daily language has declined significantly due to shift children adolescents, some are caused by sense prestige difficulty learning Madurese. scarcity research or scientific titles that raises also helps reduce literacy language. Our focuses on creating translation machine for Indonesian maintain preserve existence so can be done through digital media. This study latest dataset Madurese-Indonesian using corpus 30,000 Madura-Indonesian sentence pairs from online Bible. scrapped Bible pages organize based bilingual Then manually process text match two languages' scrapping results, normalization, tokenization remove non-printable characters punctuation corpus. To perform neural (NMT), connected RNN encoder with decoder model, while training testing, used sequential model LSTM, BLEU measure was assess accuracy results. SoftMax optimization function Adam Optimizer added settings, including 128 layers adding Dropout layer got average evaluation result BLEU-1 0.798068, BLEU-2 0.680932, BLEU-3 0.623489, BLEU-4 0.523546 five tests conducted. Given differences between Indonesian, this best approach
منابع مشابه
Rule-based Machine Translation between Indonesian and Malaysian
We describe the development of a bidirectional rule-based machine translation system between Indonesian and Malaysian (id-ms), two closely related Austronesian languages natively spoken by approximately 35 million people. The system is based on the re-use of free and publicly available resources, such as the Apertium machine translation platform and Wikipedia articles. We also present our appro...
متن کاملLSTM Neural Reordering Feature for Statistical Machine Translation
Artificial neural networks are powerful models, which have been widely applied into many aspects of machine translation, such as language modeling and translation modeling. Though notable improvements have been made in these areas, the reordering problem still remains a challenge in statistical machine translations. In this paper, we present a novel neural reordering model that directly models ...
متن کاملAn Analysis of Indonesian Language for Interlingual Machine-Translation System
This paper presents BlAS (Bahasa Indonesia Analyzer System), an analysis systemfor lndonesian language suitable for multilingual machine translation system. BIAS is developed with a motivation to contribute to on-going cooperative research project in machine translation between Indonesia andotherAsian countries.In addition,it mayserve tofosterNLPresearchinIndonesia. It startwith an overviewofva...
متن کاملHandling Indonesian Clitics: A Dataset Comparison for an Indonesian-English Statistical Machine Translation System
In this paper, we study the effect of incorporating morphological information on an Indonesian (id) to English (en) Statistical Machine Translation (SMT) system as part of a preprocessing module. The linguistic phenomenon that is being addressed here is Indonesian cliticized words. The approach is to transform the text by separating the correct clitics from a cliticized word to simplify the wor...
متن کاملToward Asian Speech Translation System: Developing Speech Recognition and Machine Translation for Indonesian Language
In this paper, we present a report on the research and development of speech to speech translation system for Asian languages, primarily on the design and implementation of speech recognition and machine translation systems for Indonesia language. As part of the A-STAR project, each participating country will need to develop each component of the full system for the corresponding language. We w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Applied Data Sciences
سال: 2023
ISSN: ['2723-6471']
DOI: https://doi.org/10.47738/jads.v4i3.113